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Abstract 

The minimum message length principle is an information theoretic criterion that links 
data compression with statistical inference. This paper studies the strict minimum message 
^ I length (SMML) estimator for d-dimensional exponential families with continuous sufficient 

statistics, for all d. The partition of an SMML estimator is shown to consist of convex 
polytopes (i.e. convex polygons when d — 2). A simple and explicit description of these 
polytopes is given in terms of the assertions and coding probabilities, namely that the i*^ 
polytope is exactly the set of data points where the posterior probability corresponding to 

■ the i^^ assertion and i*^ coding probability is greater than or equal to the other posteriors. 
Xf^ \ SMML estimators which partition the data space into n regions are therefore determined 

' by n(d + 1) numbers which describe the assertions and coding probabilities, and we give 

psj \ n{d+\) equations that these numbers must satisfy. Solving these equations with a quasi- 

■ Newton method then gives a practical method for constructing higher-dimensional SMML 
' estimators. 
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1 Introduction 

The minimum message length (MML) principle |10j is an information theoretic criterion 
that links data compression with statistical inference |^ . It has a number of useful prop- 
erties and it has close connections with Kolmogorov complexity [11]. Using the MML 
principle to construct estimators is known to be NP-hard in general [3| so it is common to 
use approximations in practice [5]. The term 'strict minimum message length' (SMML) is 
used for the exact MML criterion, to distinguish it from the various approximations. 

The only known algorithm for calculating an SMML estimator is Farr's algorithm [3] 
which applies to data taking values in a finite set which is (in some sense) 1-dimensional. 
A method for calculating SMML estimators for 1-dimensional exponential families with 
continuous sufficient statistics was also recently given in [3]. However, calculating SMML 
estimators for higher-dimensional data has been an open problem. 

This paper gives a method for calculating SMML estimators for d-dimensional expo- 
nential families of statistical models with continuous sufficient statistics. Section [2] recalls 
the relevant definitions and fixes our notation. Section |3] shows how the expected two-part 
code-length Ii changes as the partition is changed by a small amount, though the proof 
of the main technical lemma is deferred to Appendix \^ Section 2] uses this calculation to 
prove our main result, that the partition corresponding to an SMML estimator consists of 
certain convex polytopes. Section [5] shows how this can be used in practice to calculate 
SMML estimators and Section [6] states the main conclusions from this work. 
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2 SMML estimators for exponential families 



Partly to define our notation, this section briefly recalls the relevant facts about exponential 
families and their SMML estimators. 

Let X and Q be the support and natural parameter space of the exponential family 
(respectively) which are both are open, connected subsets of M"* with 8 convex. For each 
€ O, let f{x\6) be the probability density function (PDF) on X given by 

f{x\e) exp(a; ■ 6 - i!{e))h{x) (1) 

for any x ^ X , where the dot denotes the Euclidean inner product, ft, : A" — > M is a strictly 
positive function and t/i : — > M is determined by the condition 1 = ^ ^ f {x\9)dx for every 
6* e 6. Let /u: e ^- M'' be the function 



^i{e) = ¥.[x \e]^ [ xf{x\e)d. 

J X 



X 

which relates the natural parametrization of the exponential family to the expectation 
parametrization, where E[X | Q] is the expectation of any random variable with PDF ([T|). 
Then by a standard result for exponential families (e.g. see Theorem 2.2.1 of ^), is 
smooth (i.e. infinitely differentiable) , ^ is a diffeomorphism from Q to its image (i.e. a 
smooth function with a smooth inverse) and 

^l{e)^nx\e] = y^\e (2) 

Var(X I 6) = Hess(i/)) \e (3) 

where Var(X | 0) is the variance-covariance matrix of any random variable with PDF p]). 
V'i/' is the gradient of "0 and Hess(-!/') is the Hessian matrix of i/). 

Let TT be a Bayesian prior on Q and define the marginal PDF r by 

r{x) = f 7r{e) fix\9)d9 
Je 

for any x G X. We assume tt is chosen so that the first moment of r exists. 

For the case considered above, an SMML estimator with n > 1 regions is defined as 
follows [5]. Suppose we are given Ox,. . . ,9n G O (the assertions), gi, . . . , (7„ G M so that 
1 = + . . . + g„ and each qi > Q (the coding probabilities for the assertions) and a partition 
Ui, . . . ,Un of A", i.e. subsets Ui, ... ,Un Q X so that X = U^^iUi and Ui fl Uj is a set of 
Lebesgue measure for all i ^ j. We also assume that each Ui has non-zero measure and 
we will place other restrictions on each Ui in Section [H Let 9 : X ^ Q and q : X ^ M. 
be the functions which take the values 9i and qi on Ui (respectively), and note that these 
definitions make sense except on the set of measure where two or more Ui overlap. If 
we discretize the data space X to a lattice then there is a 2-part coding of the data which 
has expected length 

h = -E[log(g(X)/(X|^(X)))] (4) 

plus a constant which only depends on the width of the lattice, where X is a random 
variable with PDF r. Then an SMML estimator with n regions is a function 9{x) which 
minimizes Ii out of all estimators of this form. 

The following lemma is a refinement for exponential families of some well-known facts 
about SMML estimators. 

Lemma 1. If an SMML estimator has partition f/i, . . . , ?7„, assertions 9i, . . . ,9n and 
coding probabilities gi, . . . , then 

r[x)dx (5) 

fjL^^ i — / xr{x)dxl\ (6) 
\9i Ju^ J 
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for each i = I, . . . ,n. 

Also, for any partition Ui, . . . ,Un, not necessarily corresponding to an SMML estima- 
tor, if Qi and 9i are as in and (0j then 

n 

1=1 

where C is the constant — r{x) log h{x)dx. 
Proof. From Q we have 



h ^ - r{x)\og{q{x)f{x\eix)))dx 

J X 

n „ 

= -E / r{x)\og{q{x)f{x\0(x)))dx 

z=l -^U. 
n „ 

= r{x)\og{q^f{x\6^))dx 
»=i -^u. 

n „ 

= C-Y. rix) (log q, + x ■ 9^ -ijie,)'^ dxhy ^ 

1 = 1 •'^i 

= ^ E (^ogqi — ^p{9i)J J r{x)dx + 6i ■ J xr{x)dx 



(8) 



Now, assume Ui, . . . ,Un, 0i, . . . ,9n and gi, . . . , q„ correspond to an SMML estimator, 
i.e. they represent a global minimum of Ii. Then it is not possible to reduce Ii by 
changing qi, . . . , (/„ so that 1 = + . . . + g„ and with J7i, . . . , [/„ and 9i, . . . ,9n fixed. So 
by the method of Lagrange multipliers, at the SMML estimator the gradient of ^ (with 
only qi, . . . ,qn varying) should be proportional to the gradient of the constraint function 
91 + ... + (/„ — 1, i.e. there is some A G K so that for alH = 1, . . . , n, 

A = A— (gi + . . . + qn - 1) = t;— = / r(x)dx, 

dqz dqi q, Jjj^ 

where the last step is by ([5]), so this and the condition 1 = (71 + . . . + g„ imply ([5]). Similarly, 
Ii cannot be reduced by changing 9i while keeping t/i, ...,[/„, 0i, ... , 9i-i,9i+i, . . . ,9n and 
qi, . . . ,qn fixed. So if £ 9 C R'' has co-ordinates 9i — • • • , 9id) then by ^ and ([8]), 
for every j = 1 , . . . , d, 

= = qi^^ilj{9i) - / Xjr{x)dx 
89.1 J 89 ij Jui 

so © follows from ©. Lastly, (O follows from ©, dH) and ©. □ 



3 Deformations of the partition 

By Lemma [U finding an SMML estimator is equivalent to finding a partition of X which 
minimizes ([7]) when qi and 9i are as in ([SJ and ^ . In this section, we consider an estimator 
defined by a partition C/i, . . . , J7„ and we calculate how Ii varies as we change the partition 
by a small amount. This is interesting because, to first order, Ii should not change under 
any small deformation when [/i, . . . , C7„ corresponds to an SMML estimator. 

We now place some fairly mild restrictions on the partitions that we consider by as- 
suming each Ui is a (not necessarily connected) c?-manifold in with a piecewise smooth 
boundary 8Ui (see §3.1 of P for a general description of manifolds in M'*). This means 



3 



that each Ui is the sohd region in R*^ bounded by a (rf — l)-diniensional set dUi which 
locally has the same shape as the graph of some smooth real- valued function defined on a 
small ball in M'*, except on a (c? 2)-dimensional set where dUi is allowed to have 'ridges' 
or 'corners' like those that can occur in the graph of the minimum of a finite number of 
smooth (and transverse) functions. We therefore allow each Ui to have a very wide range 
of topologies and geometries but we do not consider partitions with fractal boundaries, 
for instance. Since we have already assumed that any two regions Ui and Uj overlap in 
a set of measure 0, we require that the interiors of Ui and Uj are disjoint and hence that 



Now, suppose that Ui and U2 share a 'face', i.e. that dUir\dU2 contains a smooth, (d — 
l)-dimcnsional, curvilinear disc D. We will deform the partition Ui, . . . , J7„ by perturbing 
D slightly. 

Let N be the unit normal vector field on D which points out of Ui and into U2, and 
extend N in any way to a smooth vector field defined on all of M''. Let g : K'' — >■ R be 
any function so that g{x) — except perhaps in a closed and bounded subset Supp(g) of 
K'^ (the support of g) which is contained in Ui U U2 and which only meets dUi U dU2 in a 
subset of D. 

For ah real t close to 0, let Ft : R'' ^ M'' be the flow of the vector held gN, i.e. for 
given x G R'^, let Ft{x) be the position of a particle in R*^ which starts at x and whose 
velocity at time t is gN evaluated at the position of the particle (sec §3.9 of [T]). Each Ft 
is a diffeomorphism from R'^ to itself and it is given by Ft{x) — x + tg{x)N{x) to a first 
order in t, for small t. If we define 



then f7i(i), . . . , Un{t) is also a partition of X for each t (since Ft is a diffeomorphism). 
Also, Fq is the identity so C/i(0) — Ui for all i = Therefore we can consider 

Ui{t), . . . , Un{t) to be a deformation of the partition Ui, . . . , Un- Also, because of the 
restrictions on Supp((7) above, Ui{t) — Ui for all t ii i 1,2 and D is the only part of 
dUi U dU2 which changes as t is varied. 
We now have the following key lemma. 

Lemma 2. Let i be 1 or 2 and let c{t) Ju-m p{^)dx for some smooth junction p : 



where V • {gpN) is the divergence of gpN and both signs are positive if i ~ 1 and both are 
negative if i = 2. 

Remark 1. In Lemma\^ and throughout this paper, we will often denote the integral of a 
function (j>{x) over a subset ft o/R'' by J^cfxix rather than j^4){x) dx. 

Proof of Lemma\^ See Sections 4.1, 6.2 and 6.3 of [2], especially equations 4.7, 6.9, 6.14 
and 6.15. The cases i — 1 and i — 2 have different signs because N is the outward-pointing 
unit normal for Ui but the inward-pointing unit normal for 1/2 • See the Appendix for an 
alternative proof of this key lemma. □ 

The following theorem gives the first and second variations of Ji corresponding to the 
above deformation of the partition, for small t. 

Theorem 3. For all t E M. close to 0, let Ui{t), . . . ,Un{t) be the partition given above. 
Let Qi and 9i be the functions oft given by and (0) but with Ui(t) replacing Ui, and let 



u^nUj cdUifidUj. 



udt) = Ftm 



M. Then 
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Ji be the function oft obtained by substituting these functions into If Ii and Ii are 
(respectively) the first and second derivatives of Ii with respect to t then 



t=o 



= / rg(Ai - A2) dx 



D 



and 



(Ai - A2)5V • [grN) dx + [ — + — 



gr dx 



D 



+—5i ■ gr''?! + -52 ■ Q2^52 + / g'^rN ■ {§1 - §2) dx 



92 



D 



(9) 



(10) 



where Xi{x) — logq^ + x ■ 9i — ip{Oi), Si ~ Jjji^ ~ l^{()i))gi" dx and Qi is the Hessian of ip 
evaluated at 6i (so Qi is symmetric and positive-definite by (3^). 



Proof. See Appendix [XI 



□ 



4 The partition of an SMML estimator 

We can now prove our main theorem. 

Theorem 4. // an SMML estimator has partition Ui, . . . ,Un, assertions ^i, . . . , 0„ and 
coding probabilities (71 , . . . , q„ then 

Ui ^ {x e X \ \i (x) > \j (x) for all j ^ 1, ... , n} 

where Xi is the linear function of x given by Xi{x) — logq^ + x ■ 9i — tpiOi). In particular, 
each Ui is a convex polytope determined by 9i, . . . ,9n and qi, . . . ,qn. 

Proof. As in the statement, let an SMML estimator have partition C/i, . . . , assertions 
01, . . . ,6n and coding probabihties qi, . . . , g„. If we define 

Vi"^^ {x e X \ Xi{x) > Xj (x) for all j = 1, . . . , n} 

then our goal is to prove that Ui — Vi for all i = 1, . . . , n. 

For i ^ j, let = {a; e R'^ I X,{x) = Xj{x)} and H,j {a; £ M'' | A,(a;) > Xj{x)}. 
For any i = 1, . . . , n, let c[^\ . . . , Cmt be the closures of the connected components of 
M'' \ {^j:j^iPij), i-e. each C^^ is a d-dimensional convex polytope with boundary lying in 
^j:j^iPij but whose interior is disjoint from all of these hyperplanes. 

Claim 1: Ui is the union of one or more of c[^\ . . . ,Cmi- Assume without loss of 
generality that i = 1 and that Ui meets U2 in a (d — l)-dimensional face. As in Section |3l 
let J7i(i), . . . , Un{t) be a deformation of the partition ?7i, . . . , C/„ corresponding to some g, 
N and D and let Qi, ()i and /i be functions of t as in Theorem |31 An SMML estimator is 
a global minimum of Ii so /i = and /i > at i = for all deformations, so in particular 
these relations hold for any g, N and D. But by ([S]), Ii = Jjyrg{Xi — X2)dx, and this 
integral can vanish for all g only if the integrand vanishes on D. So since r > 0, we must 
have Ai(2;) = A2(a;) for all x D, i.e. D is contained in the hyperplane P12. 

Since D is an arbitrary smooth (d — l)-dimensional disc contained in fJi nC/2, this shows 
that all of C/i n f/2 is contained in P12 except perhaps a set of dimension d — 2 where dUi 
or dU2 is not smooth or where dUi fl dU2 has dimension d — 2 or less. Therefore a dense 
subset of C/i n f72 is contained in P12. But P12 is closed so this implies that fJi fl C/2 C P12. 
Similar comments hold with U2 replaced by any Uj which shares a (d — l)-dimensional 
face with Ui. Therefore dUi C Uj.j^iPij and the claim follows. 
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Claim 2: If C^^^^ C Ui, c\^^ meets Pij in a {d— 1)- dimensional set and C^*' C^Pij Q dUi 
then C^^^ C Hij. Assume, without loss of generality, that i — I and j — 2. Let g, N and D 
be as above and let cj^"^ be as in the statement of this claim. Since D is arbitrary, we can 
assume additionally that D C cj^^ H P12. Since D C P12, the vector field N is constant 
on D and Ai = A2 there, so by (fTO|) . /i > is equivalent to 



> -Ji 



— + —)([ grdx) +-Si-Q-^6i + —62-Q2^S2 + N ■{§1^62) [ g^rdx 

qi 92/ \Jd J 91 92 Jd 



and hence 
N -{§2-6 









■ 1 1 ■ 






> 


/ q^rdx 








/ gr dx 




Jd 




.91 92. 




Jd 



-5i ■ Q^^i + -52 ■ Q2^S2 
91 92 



(11) 

By Theorem[3J Qi and Q2 are positive definite, so all terms on the right-hand side of pTj) 
are non- negative for all g and strictly positive for some g, so 



N -02- Bi) > 0. 



(12) 



Now, V(Ai (x) — A2(a:)) = 6*1 — 6*2 so Ai — A2 is increasing in the direction ^1 — 6*2 . Since 
Ai — A2 = on D, Ai > A2 locally on the side of D into which 9i — 62 points. But this 
must be the Ui side of D, since N ■ {62 — Oi) > and N is, by definition, the unit normal 
to D which points out of Ui and into U2- But C^^^ lies entirely on one side or the other 
of P12, so the fact that part of it lies on the side where Ai > A2 imphes all of it does, i.e. 
C« C H,2. 

Claim 3: If Vi has non-zero measure then Vi Ui. Assume Vi has non-zero measure. 
Note that this implies Vi = C^'-* for some k. For each cj^\ let #(C^*'') be the number of 
half-spaces Hij (for i fixed and j varying) so that C^*'' C Hij. Then #(6*^*') < n — 1 and 



#(C^*^) = n - 1 if and only if 

^^^^^ Vir^TT-/^ -m f-i 1 —U-( /^C^) 



('0 



lies in all H^j for j ^ i, i.e. C^*-* = Dj-j^iHij — Vi. Let 



^ have maximal #(C^ ) out of all 



(i) 



, Cml which are contained in Ui. If C 



Vi 

then Vi C Uj and the claim is proved so assume, in order to derive a contradiction, that 

of V. 



If C; 



= rijejHij for some J C {1, 



but then CI ' = V since distinct C; 



,n}\{i} then cf^ = Dj 



jHii D U,- 



iHii — Vi, 



Crrii only overlap in sets of measure 0. Taking J 
to be the set of all j so that C^'"' n Pij is a (d — l)-dimensional face, we therefore see that 



there is some j ^ i so that F C^'^ n P,j 



is a (d 



is not contained in Hij. By Claim 2, F cannot be contained in dUi, so C, C Ui where 



l)-dimensional face of C^*-* but cj^^ 



C;*-*' lies on the opposite side of F to Cj^' . But #(Cj 
the same side of every Piji as except Pij, and C, C H 



(0^ 

k I 



1 since cf'' is on 



contradicts our choice of c'f^ as one of the C\ _ 

#(C^'''). Therefore C^'-* = Vi and the claim is proved. 

Claim 4-' Each Vi has non-zero measure. Suppose, in order to derive a contradiction, 
that Vi has zero measure. If Vi, . . . , T4 have zero measure and Vk+i, . . . ,Vn have non-zero 
measure for some fc > 1 then Vi C Ui for alH > A: by Claim 3. But Ui meets each Uj (for 
j 7^ 1) in a set of zero measure, so Ui also meets each Vi in a set of zero measure when 
i > k. Also, Vi, . . . , V/c all have zero measure so Ui must meet them in sets of zero measure, 
too. Hence Ui C X meets Uf^iVi = A" in a set of zero measure, so Ui has zero measure. 
But this contradicts the fact that Ui, . . . ,Un is a partition, so the claim is proved. 

Claim 5: Vi = Ui. By Claims 3 and 4, Vi C Ui. So by Claim 1, if some Vi ^ Ui then 
there exists some cj^^^ which is contained in Ui but meets Vi in a set of measure 0. But 



(i) 



Cnu contained in Ui with maximal 



,j while C^'-* ^ Hij. But this 



6 



Vi, . . . ,Vn is a partition of X so there is some j ^ i so that meets Vj in a set C of 
non-zero measure. But Vj C Uj so C lies in both Ui and JTj, contradicting the fact that 
Ui, . . . ,Un is a partition and proving the claim and hence the Theorem. □ 

Theorem [4] is illustrated in Figure [T] This figure shows an SMML estimator for 2- 
dimensional normal data with mean 9 governed by a normal prior and variance-covariance 
matrix equal to the identity. The data space was broken into 12,000 regions (represented 
by coloured dots in the figure) and each one was randomly assigned one of n = 8 colours. 
Changes to the colour of individual points which resulted in lower Ii were made until 
no more such changes were possible. The coding probabilities and assertions were then 
calculated from ([5]) and ([6]) and these were used to find the partition (black lines) predicted 
by Theorem m showing an excellent match between the discrete and predicted partitions. 

Remark 2. By (QP, log {^if{x\6i)^ = \ogh{x) + \i{x), so Theorem^is equivalent to the 
statement that 

U, = {xeX\ q^f{x\k) > qjf{x\e,) for all j = 1, . . . (13) 

i.e. Ui is exactly the set of points where the posterior probability qif{x\Oi) corresponding 
to q-i and 9i is greater than or equal to the posterior probability corresponding to any other 
qj and 9j . 

Remark 3. The version il3\) of Theorem^ might generalize to non- exponential families 
of statistical models. If this is true then the partitions for these models will not, in general, 
consist of poly topes. 

Remark 4. Theorem also implies that each Ui is the projection to M.^ of one of the 
facets (i.e. d- dimensional faces) of the convex polytope 

{{x,y) e M'' X M I y > \,{x) for alli^l,.. . 

This description is useful in practice when trying to construct the partition corresponding 
to given assertions and coding probabilities. 

We also have the following corollary, which generalizes the one given in [3] to higher 
dimensions. Recall from Section [5] that q(x) and 6{x) are step functions which are constant 
on the interior of each Ui and are not defined on iJ^^^dUi, where two or more of the Ui 
overlap. 

Corollary 5. For an SMML estimator, 

q{x)f{x\0{x))— max /i(a;) exp(Ai(a;)) 

for all X in the dense subset of X where the left-hand side is defined. So even though 
q{x) f {x\9 [x)) is composed of step functions, it extends continuously to all of X . 

Proof. Note the left-hand side of the equation in the statement is defined exactly when x 
lies in the interior of some Uj. But in that case, 

log [q{x)f{x\9{x))] = log [qjf{x\9j)] \ogh{x) + \j{x) = \ogh{x) + max \i{x) 

\ / \ / 'iG{l,...,n} 

where the second equality used ^ and the last equality used Theorem SI Since this 
formula holds for all j, taking exponentials completes the proof. □ 

Lastly, the condition /i > on the second variation of Ii gives us the following in- 
equality. 
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5 



x1 



Figure 1: The partition of an SMML estimator for 2-dimensional normal data with known 
variance and a normal prior for the mean. 
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Lemma 6. For an SMML estimator, if Ui and Uj share a {d — 1)- dimensional face then 



rix) r 

> [l+{x-fL,) 



{x ~ ft,)] +^[l + {x~ fij) ■ Hx - Aj)] 



for any x ^ Ui D Uj, where fii = ^{Oi) and Qi is the Hessian of ip evaluated at 9i. 

Proof. Assume, without loss of generality, that i ~ 1 and j — 2. Let g, N and D be as 
in Section [3] so, as in the proof of Theorem SI /i > becomes the inequality ([Tl]) . Now, 
D C implies that N is proportional to 62 — Oi so the inequality (IT^ and the fact that 
is a unit vector gives N ■ {62 — 9i) = ||^2 — ^i||- Combining this with (ITT|) gives 



> 













/ q^rdx 




■ 1 ^ 1 ■ 




/ gr dx 


Jd 




_qi 92. 




Jd 



1 

91 



—5i ■ Qih + -,52 • g^M2 



1 

32 



(14) 

for any g and D. Then take a sequence of g approaching a Dirac delta function to get the 
desired result at any x € D. But D was arbitrary so the lemma holds for a; in a dense 
subset of Ui n J72, hence it holds for all a; £ ?7i n C/2 ■ □ 



5 Constructing SMML estimators 

The usual approach to constructing an SMML estimator is to use ([S]) and ^ to, in effect, 
parameterize the assertions and coding probabilities by the partition and then to try to 
find the partition which minimizes the expression (O for /i [51 SJ |3| . Theorem U allows us 
to reverse this approach, i.e. to use the assertions and coding probabilities to parameterize 
the partition. This is useful when d > 2 because then the set of all possible partitions is 
infinite dimensional while the assertions and coding probabilities are described by n(d+ 1) 
numbers. With this parametrization, ([5]) and ([6]) become n{d + 1) equations which are 
satisfied at the SMML estimator. It is therefore possible to find the SMML estimator 
for a given number n of regions by solving these equations (maybe with a quasi-Newton 
method) . 

In the case d — 1, the above approach finds an SMML estimator by solving 2n equations 
in 2n unknowns while the approach of |3] for the same problem solves n — 1 equations in 
n — 1 unknowns. Therefore the method of [3] is probably more efficient than the one above 
for 1-dimensional problems. 



6 Summary and conclusions 

We studied SMML estimators for d-dimensional exponential families with continuous suffi- 
cient statistics. Because the data space is continuous, we could use methods from calculus 
to study how the expected two-part code length Ii changed under small deformations of 
the partition. Since SMML estimators are global minima of Ii, all deformations of the 
partition of an SMML estimator must satisfy the conditions /i = and Ji > on the first 
and second variations of Ii . These conditions were then used to prove that the partition of 
an SMML estimator consists of certain convex polytopes determined by its assertions and 
coding probabilities. We further gave equations which these assertions and coding prob- 
abilities must satisfy, thereby providing a method for constructing SMML estimators for 
exponential families in practice. While the results given here apply for all d, this approach 
is probably less efficient than the one given in [3] when d = 1. 

Our results rest on the assumption that each set Ui in the partition is a d-manifold with 
piece-wise smooth boundary. This is a mild assumption, since it still allows each Ui to 
have a very wide range of topologies and geometries. However, properly speaking, we have 
only shown that convex polytopes minimize Ii out of all partition consisting of d-manifolds 
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with piece-wise smooth boundaries. It is therefore possible (though maybe unhkely) that 
other partitions, such as those with fractal boundaries, might give smaller expected code- 
lengths. To address this possibility, it might be possible to use methods from geometric 
measure theory to generalize our results to partitions consisting of rectifiable sets or more 
general objects. However, this would still leave open the possibility that a smaller Ii could 
be found by allowing even more general partitions. Also, while most geometric measure 
theory text books allow very general sets Ui, they unfortunately seem to derive Lemma [2] 
only for p an integer- valued function [51 |H1 [7] ■ 



A Proofs of technical lemmas 

We first prove the first and second variation formulae for Ii under small deformations. 
Proof of Theorem[M Let fii = fi{0i). Then by ©, 

n 

C - Ii^^(qt (^log q, - V(^i)) + • Qifj-i 
1=1 

so differentiating both sides with respect to t and denoting derivatives with dots gives 
Now, 

by the chain rule and ([2]), and — J27=i ii since 1 = Y^^=i 9* ^O'' Therefore 

" / d \ 

-ii = J2U (log 1^ - ^(^"^)) +^^-di ) ■ ^^^^ 

i=l ^ ^ 
Differentiating this again and denoting second derivatives by double dots gives 

= X! ( * (l°g * ~ V'(^i)) + it [ — ~ k ■ fi.i) + 0, ■ — {q.fii) + 0t- -TH ) • (16) 



' "^'J • ' dt " ' ' di2 

We now apply Lemma [5] to calculate the derivatives of qi and qifii- Setting p ~ r va. 
this lemma gives 



q\ = -92 = grdx (17) 
Jd 

qi = -92 - / gV • {grN) dx (18) 

JD 

where all the derivatives are evaluated at i = 0. Setting p{x) = Xjr{x) in Lemma[2l where 
X = {xi, . . . ,Xd) e K'^, gives us the first and second derivatives of the j*'' component of 
qifli. Putting these components together gives 



d d f 

^ (91A1) = (92A2) = j^xgrdx (19) 
^(9iAi) = -^(92A2) = J^{g^rN + xgV -(grN)) dx (20) 

where, again, all the derivatives are evaluated at t = 0. Here we have used the fact that 
V • {xjgrN) = grN -Vxj + XjV ■ {grN) . 
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When i ^ 1,2, Ui(t) = Ui for all t so all derivatives of qi and qijli vanish in this case. 
Then combining (US]), (HH) and dH]) gives 

-A 







t=0 





logqi - ipiOi)^ ' (/ ^Srdx^ 
rdx^ (log 52 - "0(^2)) - 62 ■ (^J xgrdx 
gr {\ogqi - il^{9i) + 9i ■ x - log 92 + "0(^2) - O2 ■ xj dx 
rg{\i - A2) 



as required, where Xi{x) = logg^ + x ■ 9i — ip{9i). 
Substituting into (HH) gives 









Id 







(^j grdx^ {9i-fii) + 9i- J xgr dx + §1 ■ J {g'^rN + xgV ■ {grN)) dx 









J grdx^ 







+ (^J grdx^ {92 ■ (12) - 92 ■ j xgrdx - ■ J {g'^rN + xg\/ ■ {grN)) dx 
(log qi - ij{ei) + 01 • x - log 92 + ^(^2) -92-x'^gV- (grN) dx 

/ grdx\ + 9i ■ [x — lii)gr dx — 92 ■ / {x — ij,2)gr dx 
J D J J D J D 



1 1^ ^ " 



91 92 



+ / g^rN -{91 ~ 92) dx 
J^iXi - A2)gV • (grA^) dx + (^1 + 1^ dx^ + ^1 • (5i - ^2 • ^2 



+ / g^rN ■{9i~92)dx. 

D 



Now, fii — fJ.{9i) so /ii = Ji9i where Ji is the Jacobian matrix of /i evaluated at 9i. But 
by ([2]) and ([31), Ji = and Qi is symmetric and positive definite. This implies that Qi 
is invertible so 9i = Q^^Mi- Writing the left-hand side of (fT9| as qifii + qifii, rearranging 
and using (fT7| gives 

/ii = — ( / xgrdx ~ fli [ gr dx] = — / {x ~ fii)qrdx = — 
qi \Jd ' Jd J Qi Jd ' 91 

so 9i — -^Q^^Si. Similarly, 92 — ^-^Q2^^'2^ so the theorem follows. □ 

We cited the literature for a proof of Lemma [2] However, this is a key lemma, so 
we also give a proof in this appendix, beginning with a general lemma. See [1] for an 
introduction to differential forms. Lie derivatives. Stokes' theorem, etc. 

Lemma 7. Let C M'' be a d-manifold with piecewise smooth boundary d^l. Let V be a 
vector field on R'' with flow Ft : R"^ ^ R'^ and let = Ft{fl). If c{t) = J^^ uj for some 
differential d-form uj then 

dc 
di 



ivuj (21) 

t=o "'an 
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iv d ivijJ 



(22) 



where d is the exterior derivative and iyct is the interior product ofV and any differential 
form a. 

Proof. We first note that 

dc 
di 

where Cy is the Lie derivative of V. So using Cartan's formula Cy = d iy + iyd, the fact 
that did = (since uj is top-dimensional) and Stokes' theorem we have 



d 




f ^ 


1 f:^ ^ 










/ w = — 


1 - 


t=o dt 


t=0 


dt 




Jn dt 


t=o 



dc 
dt 

Similar reasoning gives 



{d iy + iyd)uj = d iyUJ 



lyUJ. 



dc 
dt 



d 
ds 



s=0 



f / ^ F:{F:uj)^ I Cy{F:u;)^ f zy{F: 



Now, if wi, . . . , are any vector fields on dfl then 

{iy{F*Uj)){vi,...,Vd-l) = 



d_ 

di 



d_ 

di 



{F:uj){y,v^,...,Vd-i) 

(CyUj){V,Vi, . . .,Vd-l) 
{iyCyUj){vi,...,Vd-l) 



SO 



Therefore 



d c 



d_ 

di 



iy(F^UJ) — iyCyOJ. 



d_ 

di 



t=o Jan 



iy{FfUj) = / iyCyUJ 



an 



/ iy{d i 
Jan 



y + iyd)uj 



iy d iyUJ. 



an 



□ 



We can now give our proof of Lemma [2j 



Alternative proof of Lemma\^ Let xi, . . . , Xrf be the standard co-ordinates on so that 
dxi A ... A dxd is the volume form. We will apply Lemma [7] with D, = Ui for i cither 1 
or 2, y = gN and a; — pdxi A ... A dxd- Then ([2T|) becomes 



dc 
di 



t=o Jon 



lyUJ ■ 



au^ 



igwipdxi A . . . A dxd) — / gp{iNdxi A . . . A dxd) — ± / gpdx 
JaUi Jd 



since Supp((7) n dUi C D and ijsidxi A ... A dxd is the volume form on dUi and minus the 
volume form on dU2 (recall that N is the unit normal vector field on D which points out 
of Ui and into U2). 

If N ^ {Ni,...,Nd) then 



iyUJ = y^^{~iy^^ gpNtdxi A ... A A ... A dxd 
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where the hat indicates that the term is excluded. Therefore 

■ a 

d iyw = ^^^^^y^^T, — {gpNi)dxj Adxi A . . . A dxi A . . . Adxd 

OXj 



A ... A dxi A ... A dxd 



Y,i'^y^'-g^i9PN^)dx,Adxl 
^ d 

{gpNi) dxi A ... A dxd 

i—l * 

(V • {gpN)) dxiA...A dxd 



iv d iyw = (5V • {gpN)) {i^dxi A ... A dxd) ■ 

Substituting this into <\'2'2\ then completes the proof of the lemma since g is on dUi except 
perhaps in D and i^dxi A ... A dxd is the volume form on dUi and minus the volume form 
on dU2. □ 
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